Gender-Preferential Text Mining of E-mail Discourse
نویسندگان
چکیده
This paper describes an investigation of authorship gender attribution mining from e-mail text documents. We used an extended set of predominantly topic content-free e-mail document features such as style markers, structural characteristics and gender-preferential language features together with a Support Vector Machine learning algorithm. Experiments using a corpus of e-mail documents generated by a large number of authors of both genders gave promising results for author gender categorisation.
منابع مشابه
Language and Gender Author Cohort Analysis of E-mail for Computer Forensics
We describe an investigation of authorship gender and language background cohort attribution mining from e-mail text documents. We used an extended set of predominantly topic content-free e-mail document features such as style markers, structural characteristics and gender-preferential language features together with a Support Vector Machine learning algorithm. Experiments using a corpus of e-m...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملDetecting Contrast Patterns in Newspaper Articles by Combining Discourse Analysis and Text Mining
Text mining aims at constructing classification models and finding interesting patterns in large text collections. This paper investigates the utility of applying these techniques to media analysis, more specifically to support discourse analysis of news reports about the 2007 Kenyan elections and postelection crisis in local (Kenyan) and Western (British and US) newspapers. It illustrates how ...
متن کاملMining Discourse Markers For Chinese Textual Summarization
Discourse markers foreshadow the message thrust of texts and saliently guide their rhetorical structure which are important for content filtering and text abstraction. This paper reports on efforts to automatically identify and classify discourse markers in Chinese texts using heuristic-based and corpus-based data-mining methods, as an integral part of automatic text summarization via rhetorica...
متن کاملAutomatic discourse connective detection in biomedical text
OBJECTIVE Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised mac...
متن کامل